Spark Workflow
Typical Spark workflow that consists of ingestion, processing, storage and analytics –
- Ingests data from source- HDFS, NoSQL, S3, real time sources, etc.
 
- Transforms Data- Filter, Clean, Join, Enhance
 
- Persists processed data- Memory, HDFS, NoSQL
 
- Interactive Analytics- Shells, Spark SQL, third-party tools
 
- Machine Learning
- Action
All these tasks in the workflow are explained in detail in later sections